Predicting Survival Probability of NASA Aircraft Engines

Using Cox Proportional Hazards Model

Jayme Reed & Brad Paton (Advisor: Dr. Cohen)

April 15, 2025

Cox Proportional Hazards (CPH) Model

What is it?

  • A statistical regression method specializing in modeling time-to-event predictions with survival data (Abeysekera and Sooriyarachchi 2009)

    • Survival data has a value for time and an indicator column for an event
  • Is a method that can deal with censored data

    • Censored data is when the information about an individual in a study is only known for a certain period of time (Klein and Moeschberger 2005)
  • Primarily used in the health field but has applications in predicting bank failure, the survival probability of machines, and insurance likelihood payouts

Limitations

Mathematical Formulas

  • Survival Function: \(S(x) = \int_{x}^{\infty} f(x)dx\)

  • Hazard Function: \(h(x) = \lim\limits_{\Delta x \rightarrow 0} \frac{P[x \leq X < x + \Delta x|X \geq x]}{\Delta x}\)

  • CPH Model Hazard Function: \(h(t|\mathbf Z) = h_0(t)\text{exp}(\sum\limits_{k=1}^{p} \beta_kZ_k)\)

  • Proportional Hazards Ratio: \(\frac{h(t|\mathbf Z)}{h(t|\mathbf Z*)} = \text{exp}[\sum\limits_{k=1}^{p} \beta_k(Z_k - Z_k^*)]\)

  • Cumulative Hazard Function: \(H(x) = \int_0^x h(u) du\)

  • Concordance Index: \(C = \frac{c + \frac{t_x}{2}}{c + d + t_x}\)

  • Survival Probability: \(S(t) = e^{-H(t)}\), where \(H(t)\) is the above cumulative hazard function

Assumptions

There are four assumptions for CPH:

  • Independence assumption

    • Assumes that the survival times of observed subjects are independent of each other (Nahhas 2025)
  • Non-informative Censoring Assumption

    • Assumes that censoring is non-informative (Nahhas 2025)
  • Linearity Assumption

    • Assumes the relationship between covariates and the outcome is a linear relationship (Nahhas 2025)
  • Proportional Hazards Assumption

    • Assumes the ratio of hazards rates for any two subjects are constant at all times (Bustan 2018)

Visualization

The two common model visualizations are:

  • Kaplan-Meier Survival Curve

  • Forest Plots

    • Visualizes the effects of each covariate in the model on the hazard ratio

Evaluation and Survival Probability

  • Model accuracy evaluation is done using the concordance index

    • The concordance index measures the amount of agreement between two variables

    • A value of 1 means all the pairs are correctly ordered while a value of 0 means no pairs are correctly ordered

  • Survival probability can be predicted at a specific time \(t\)

    • If the probability is \(\geq 50\)%, it is assumed the event has not occurred

    • If the probability is \(< 50\)%, it is assumed the event has occurred

Data Structure

The data selected for this project comes from a study on propagation modeling that NASA completed, specifically focusing on the engine two testing and training datasets (Saxena et al. 2008).

  • Each engine in the NASA data has an unknown amount of wear, manufacturing variation, and sensor noise

  • There are three operation setting fields and twenty-one sensor measurement fields

  • A column indicating status was added to both the testing and training datasets with 0 indicating the machine has not failed and 1 indicating the machine has failed

  • The data was then merged together to make one grouped dataset

NASA Aircraft Engine Data
id time status os1 os2 os3 sm1 sm2 sm3
1 149 1 42.0017 0.8414 100 445.00 550.49 1366.01
2 269 1 42.0047 0.8411 100 445.00 550.11 1368.75
3 206 1 42.0073 0.8400 100 445.00 550.80 1356.97
4 235 1 0.0030 0.0007 100 518.67 643.68 1605.86
5 154 1 42.0049 0.8408 100 445.00 550.53 1364.82

Exploration

  • 519 engines in the combined data

    • 260 engines in training data

    • 259 engines in testing data

Summary Metrics
Metric Value
Minimum 128.00
Median 199.00
Mean 206.77
Standard Deviation 46.78
Maximum 378.00

Visualization

Creating CPH Model

The table provides the model number, the covariates used, the AIC and BIC from the stepwise regression if applicable, and the concordance index.

Model Covariates AIC BIC Concordance
model 1 All covariates 2449.92 2547.71 0.6956658
model 2 os3, sm3 - sm5, sm8 - sm9, sm15 - sm18, sm20 2431.88 N/A 0.6917216
model 3 os3, sm3 - sm4, sm8-sm9, sm15, sm18 N/A 2461.31 0.6833401
model 4 os3, sm13 - sm14, sm19 N/A N/A 0.5881404

Model for the continuing analysis will have all covariates except sm16 and sm19.

CPH Model Forest Plot

Checking Assumptions

Model Results

Conclusion

References

Abeysekera, W. W. M., and M. R. Sooriyarachchi. 2009. “Use of Schoenfeld’s Global Test to Test the Proportional Hazards Assumption in the Ox Proportional Hazards Model: An Application to a Clinical Study.” https://www.researchgate.net/publication/238483310_Use_of_Schoenfeld's_global_test_to_test_the_proportional_hazards_assumption_in_the_Cox_proportional_hazards_model_An_application_to_a_clinical_study.
Asghar, Naseem, Khalil Umair, and Iftikhar Uddin. 2024. “Mixture and Non-Mixture Cure Models for the Survival Analysis of SARS-CoV-2 Patients in Khyber Pakhtunkhwa, Pakistan.” Pakistan Journal of Medical Sciences 40 (8): 1841–46.
Bustan, M. Nadjib. 2018. “Cox Proportional Hazard Survival Analysis to Inpatient Breast Cancer Cases.”
Jiang, Nan, Wu Yongfa, and Chengjia Li. 2024. “Limitations of Using COX Proportional Hazards Model in Cardiovascular Research.” Cardiovascular Diabetology, no. 1: 1–2.
Klein, John P., and Melvin L. Moeschberger. 2005. Survival Analysis: Techniques for Censored and Truncated Data. 2nd ed. Springer.
Nahhas, Ramzi W. 2025. Introduction to Regression Methods for Public Health Using r. 1st ed. Chapman & Hall.
Saxena, Abhinav, Kai Goebel, Don Simon, and Neil Eklund. 2008. “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation,” 1–9. https://doi.org/10.1109/PHM.2008.4711414.
Wang, Weiwei, Xiaotian Chang, and Feifei Lin. 2025. “Adding Salt to Foods and Risk of Incident Depression and Anxiety.” BMC Medicine, no. 1: 1–10.
Zhang, Yue, Yangyang Cheng, and Rodrigo M Carrillo-Larco. 2025. “Postpartum Depression in Relation to Chronic Diseases and Multimorbidity in Women’s Mid-Late Life: A Prospective Cohort Study of UK Biobank.” BMC Medicine 23 (1): 1–13.